Techniques for Estimating the Ideal Binary Mask
نویسندگان
چکیده
This paper provides a comparison of binary mask estimation techniques, based on different ways of estimating the instantaneous SNR. The effect of six different gain functions and three noise estimation algorithms on estimating the SNR, and subsequently the binary mask was assessed. New criteria are proposed for classifying time-frequency bins as belonging to the target or masker signals. Sentences from the NOIZEUS corpus embedded at 0-10 dB SNR levels in four types of noise were used for evaluation. Performance of the binary mask estimation algorithms was evaluated in terms of hit rate and false alarm. Results indicated that the use of different SNR estimation techniques affects primarily the false alarm rate.
منابع مشابه
Asr-driven Binary Mask Estimation for Robust Automatic Speech Recognition
Additive noise has long been an issue for robust automatic speech recognition (ASR) systems. One approach to noise robustness is the removal of noise information through segregation by binary time-frequency masks; each time-frequency unit in a spectro-temporal representation of the speech signal is labeled either noise-dominant or signal-dominant. The noise-dominant units are masked and their e...
متن کاملFactors influencing intelligibility of ideal binary-masked speech: implications for noise reduction.
The application of the ideal binary mask to an auditory mixture has been shown to yield substantial improvements in intelligibility. This mask is commonly applied to the time-frequency (T-F) representation of a mixture signal and eliminates portions of a signal below a signal-to-noise-ratio (SNR) threshold while allowing others to pass through intact. The factors influencing intelligibility of ...
متن کاملBlind Dereverberation of Audio Signals
This project examines the problem of single channel blind dereverberation. After estimating the T60 value, a time-domain binary masking approach was used to remove regions of the signal that were largely dominated by reverberant energy. Performance of the system was examined for several different classes of audio (hand clapping, drums, and speech) and for varying amounts of reverberation. In ad...
متن کاملA data-driven approach for estimating the time-frequency binary mask
The ideal binary mask, often used in robust speech recognition applications, requires an estimate of the local SNR in each timefrequency (T-F) unit. A data-driven approach is proposed for estimating the instantaneous SNR of each T-F unit. By assuming that the a priori SNR and a posteriori SNR are uniformly distributed within a small region, the instantaneous SNR is estimated by minimizing the l...
متن کاملEstimation of the Ideal Binary Mask Using Directional Systems
The ideal binary mask is often seen as a goal for time-frequency masking algorithms trying to increase speech intelligibility, but the required availability of the unmixed signals makes it difficult to calculate the ideal binary mask in any real-life applications. In this paper we derive the theory and the requirements to enable calculations of the ideal binary mask using a directional system w...
متن کامل